10,353 research outputs found

    Faster Compact On-Line Lempel-Ziv Factorization

    Get PDF
    We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(Nlog⁑N)O(N\log N) time and uses only O(Nlog⁑σ)O(N\log\sigma) bits of working space, where NN is the length of the string and Οƒ\sigma is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(Nlog⁑3N)O(N\log^3N) time (Okanohara & Sadakane 2009) or O(Nlog⁑2N)O(N\log^2N) time (Starikovskaya 2012). The key to our new algorithm is in the utilization of an elegant but less popular index structure called Directed Acyclic Word Graphs, or DAWGs (Blumer et al. 1985). We also present an opportunistic variant of our algorithm, which, given the run length encoding of size mm of a string of length NN, computes the Lempel-Ziv factorization on-line, in O(mβ‹…min⁑{(log⁑log⁑m)(log⁑log⁑N)log⁑log⁑log⁑N,log⁑mlog⁑log⁑m})O\left(m \cdot \min \left\{\frac{(\log\log m)(\log \log N)}{\log\log\log N}, \sqrt{\frac{\log m}{\log \log m}} \right\}\right) time and O(mlog⁑N)O(m\log N) bits of space, which is faster and more space efficient when the string is run-length compressible

    Fully dynamic data structure for LCE queries in compressed space

    Get PDF
    A Longest Common Extension (LCE) query on a text TT of length NN asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G\mathcal{G} of size w=O(min⁑(zlog⁑Nlogβ‘βˆ—M,N))w = O(\min(z \log N \log^* M, N)) [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of TT, which can be seen as a compressed representation of TT, has a capability to support LCE queries in O(log⁑N+log⁑ℓlogβ‘βˆ—M)O(\log N + \log \ell \log^* M) time, where β„“\ell is the answer to the query, zz is the size of the Lempel-Ziv77 (LZ77) factorization of TT, and Mβ‰₯4NM \geq 4N is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, G\mathcal{G} can be enhanced to support efficient update operations: After processing G\mathcal{G} in O(wfA)O(w f_{\mathcal{A}}) time, we can insert/delete any (sub)string of length yy into/from an arbitrary position of TT in O((y+log⁑Nlogβ‘βˆ—M)fA)O((y+ \log N\log^* M) f_{\mathcal{A}}) time, where fA=O(min⁑{log⁑log⁑Mlog⁑log⁑wlog⁑log⁑log⁑M,log⁑wlog⁑log⁑w})f_{\mathcal{A}} = O(\min \{ \frac{\log\log M \log\log w}{\log\log\log M}, \sqrt{\frac{\log w}{\log\log w}} \}). This yields the first fully dynamic LCE data structure. We also present efficient construction algorithms from various types of inputs: We can construct G\mathcal{G} in O(NfA)O(N f_{\mathcal{A}}) time from uncompressed string TT; in O(nlog⁑log⁑nlog⁑Nlogβ‘βˆ—M)O(n \log\log n \log N \log^* M) time from grammar-compressed string TT represented by a straight-line program of size nn; and in O(zfAlog⁑Nlogβ‘βˆ—M)O(z f_{\mathcal{A}} \log N \log^* M) time from LZ77-compressed string TT with zz factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.Comment: arXiv admin note: text overlap with arXiv:1504.0695
    • …
    corecore